William Gann's Time Series Feature Engineering with tsfresh for Automated Signal Discovery
tsfresh is a effective Python package that automates the process of feature extraction from time series data. It can generate hundreds of features, ranging from simple statistics to more complex spectral analysis features. This can be a huge time-saver for quantitative traders, as it allows them to quickly generate a large and diverse set of features for their models.
Integrating tsfresh with Scikit-Learn Pipelines
tsfresh can be seamlessly integrated with Scikit-Learn pipelines, allowing you to create a streamlined workflow for feature extraction, selection, and model training.
from tsfresh import extract_features
from tsfresh.feature_extraction import ComprehensiveFCParameters
from sklearn.pipeline import Pipeline
# Assume data is a pandas DataFrame with columns: id, time, and value
extracted_features = extract_features(data, column_id="id", column_sort="time",
default_fc_parameters=ComprehensiveFCParameters())
# Now you can use extracted_features in your Scikit-Learn pipeline
from tsfresh import extract_features
from tsfresh.feature_extraction import ComprehensiveFCParameters
from sklearn.pipeline import Pipeline
# Assume data is a pandas DataFrame with columns: id, time, and value
extracted_features = extract_features(data, column_id="id", column_sort="time",
default_fc_parameters=ComprehensiveFCParameters())
# Now you can use extracted_features in your Scikit-Learn pipeline
Feature Filtering and Selection
tsfresh also provides tools for filtering and selecting the most relevant features. This is important because many of the generated features may be redundant or irrelevant.
from tsfresh.feature_selection.relevance import calculate_relevance_table
relevance_table = calculate_relevance_table(extracted_features, y)
from tsfresh.feature_selection.relevance import calculate_relevance_table
relevance_table = calculate_relevance_table(extracted_features, y)
A Complete Example
Let's consider a complete example of using tsfresh to generate and select features for a trading model.
| Feature | p-value | Relevant |
|---|---|---|
| value__fft_coefficient__attr_"abs"__coeff_1 | 0.01 | True |
| value__cwt_coefficients__widths_(2, 5, 10, 20)__coeff_1__w_2 | 0.02 | True |
| value__spkt_welch_density__coeff_2 | 0.03 | True |
| ... | ... | ... |
Mathematical Formulation: Fast Fourier Transform (FFT)
One of the features that tsfresh can generate is the FFT coefficient. The FFT is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse. The formula for the DFT is:
Where:
- $X_k$ is the DFT coefficient.
- $x_n$ is the input sequence.
- $N$ is the number of samples.
By using tsfresh to automate the feature engineering process, you can save time and potentially discover new and profitable trading signals.
